https://github.com/BuzzFeedNews/2018-07-wildfire-trends/tree/master/dataThe data of Californina fire reports 1950 - 2017
https://www.macrotrends.net/states/california/population California Historical Population 1950 - 2020
I am interested in the topic and the data seems to coming from a reliable source with great amount of historic detials regarding California fires
I tought it is interesting to see how wildfires gets so bad in the recent years. Of course that climite change is a big factor, but i believe there are other reasons that is causing this. In this projct I want to find the relationship between wildfire and population in California
import pandas as pd
df_ca_fire = pd.read_csv('cali_fire.csv')
df_ca_population = pd.read_csv('cali_population.csv')
# TODO: Use the info() method to determine to inspect the variable (column) names, the number of non-null values,
# and the data types for each variable.
df_ca_fire.info()
df_ca_population.info()
# TODO: Use the head() method to inspect the first five (or more) rows of the data
df_ca_fire.head()
df_ca_population.head()
# TODO: Use the tail() method to inspect the last five (or more) rows of the data
df_ca_fire.tail()
df_ca_population.tail()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14847 entries, 0 to 14846 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 objectid 14847 non-null int64 1 year_ 14847 non-null int64 2 state 14844 non-null object 3 agency 14842 non-null object 4 unit_id 14837 non-null object 5 fire_name 11944 non-null object 6 inc_num 14262 non-null object 7 alarm_date 13230 non-null object 8 cont_date 7361 non-null object 9 cause 14805 non-null float64 10 comments 1875 non-null object 11 report_ac 7088 non-null float64 12 gis_acres 14841 non-null float64 13 c_method 7494 non-null float64 14 objective 14672 non-null float64 15 fire_num 12131 non-null object 16 shape_length 14847 non-null float64 17 shape_area 14847 non-null float64 dtypes: float64(7), int64(2), object(9) memory usage: 2.0+ MB <class 'pandas.core.frame.DataFrame'> RangeIndex: 121 entries, 0 to 120 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 121 non-null int64 1 Population 121 non-null object 2 Growth Rate 121 non-null object dtypes: int64(1), object(2) memory usage: 3.0+ KB
| Year | Population | Growth Rate | |
|---|---|---|---|
| 116 | 1904 | 1,792,000 | 5.29% |
| 117 | 1903 | 1,702,000 | 4.87% |
| 118 | 1902 | 1,623,000 | 4.71% |
| 119 | 1901 | 1,550,000 | 4.03% |
| 120 | 1900 | 1,490,000 | 0.00% |
1.Due to the increasing wildfire cases in california, I want to understand the relationship between wildfire and the population growth. 2.I believe there are positive correlation between the two.
1.The population represents the wild fire reports that happend in the state of california from 1950 to 2017 2.According to The Fire and Resource Assessment Program (FRAP): "The data covered the period 1950 to 2001 and included USFS wildland fires 10 acres and greater, and CAL FIRE fires 300 acres and greater. BLM and NPS joined the effort in 2002, collecting fires 10 acres and greater. Also in 2002, CAL FIRE’s criteria expanded to include timber fires 10 acres and greater in size, brush fires 50 acres and greater in size, grass fires 300 acres and greater in size, wildland fires destroying three or more structures, and wildland fires causing $300,000 or more in damage. As of 2014, the monetary requirement was dropped and the damage requirement is 3 or more habitable structures or commercial structures."
The data is collected via fire reported using a combination of ground-based and satellite-based data
import pandas as pd
Data_Desc = {
'column' : ['year','fire name', 'alarm_date', 'cause', 'gis_acres','shape_area'],
'description' : ['Fire Year','Name of the fire','Alarm date for the fire','Reason fire ignited','GIS calculated area in acres', 'Area in square meters']
}
df_data_desc = pd.DataFrame(Data_Desc)
df_data_desc
| column | description | |
|---|---|---|
| 0 | year | Fire Year |
| 1 | fire name | Name of the fire |
| 2 | alarm_date | Alarm date for the fire |
| 3 | cause | Reason fire ignited |
| 4 | gis_acres | GIS calculated area in acres |
| 5 | shape_area | Area in square meters |
cause_desc ={
'cause' : [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],
'description' : ['Lightning','Equipment Use','Smoking','Campfire','Debris', 'Railroad','Arson','Playing with Fire',
'Miscellaneous','Vehicle','Power Line','Firefighter Training','Non-Firefighter Training','Unknown/Unidentified',
'Structure', 'Aircraft','Volcanic','Escaped Prescribed Burn','Illegal Alien Campfire']
}
df_cause_desc = pd.DataFrame(cause_desc)
df_cause_desc
| cause | description | |
|---|---|---|
| 0 | 1 | Lightning |
| 1 | 2 | Equipment Use |
| 2 | 3 | Smoking |
| 3 | 4 | Campfire |
| 4 | 5 | Debris |
| 5 | 6 | Railroad |
| 6 | 7 | Arson |
| 7 | 8 | Playing with Fire |
| 8 | 9 | Miscellaneous |
| 9 | 10 | Vehicle |
| 10 | 11 | Power Line |
| 11 | 12 | Firefighter Training |
| 12 | 13 | Non-Firefighter Training |
| 13 | 14 | Unknown/Unidentified |
| 14 | 15 | Structure |
| 15 | 16 | Aircraft |
| 16 | 17 | Volcanic |
| 17 | 18 | Escaped Prescribed Burn |
| 18 | 19 | Illegal Alien Campfire |
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
# set up notebook to display multiple output in one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
#Read files
df_ca_fire = pd.read_csv('cali_fire v2.csv')
df_ca_population = pd.read_csv('cali_population.csv')
df_ca_fire.info()
df_ca_population.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14805 entries, 0 to 14804 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year_ 14805 non-null int64 1 fire_name 11912 non-null object 2 alarm_date 13203 non-null object 3 cause 14805 non-null int64 4 gis_acres 14804 non-null float64 5 shape_area 14805 non-null float64 6 cause_desc 14805 non-null object dtypes: float64(2), int64(2), object(3) memory usage: 809.8+ KB <class 'pandas.core.frame.DataFrame'> RangeIndex: 121 entries, 0 to 120 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 121 non-null int64 1 Population 121 non-null int64 dtypes: int64(2) memory usage: 2.0 KB
# creat a new dataframe of fire count in each year
fire_count = df_ca_fire['year_'].groupby(df_ca_fire['year_']).agg(['count'])
fire_count = fire_count.reset_index()
fire_count.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 68 entries, 0 to 67 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year_ 68 non-null int64 1 count 68 non-null int64 dtypes: int64(2) memory usage: 1.2 KB
#graph for Population by year
fig = px.line(df_ca_population, x = 'Year', y = 'Population',
title='California Population Growth')
fig.update_layout(height = 600, xaxis_title = 'Year')
#graph for fire count by year
fig = px.line(fire_count, x = 'year_', y = 'count',
title='California Fire Growth')
fig.update_layout(height = 600, xaxis_title = 'Year')
We can see from the graphs above a positive correlation between the number of fires and the population of California overall. Both numbers have grown exponentially, which is worth investigating deeper.
#pie chart seperate the fire that cause by Natural, Human or Unknown
Natural = (df_ca_fire['cause'] == 1) | (df_ca_fire['cause'] == 17)
Human = (df_ca_fire['cause'] != 1) | (df_ca_fire['cause'] != 17) | (df_ca_fire['cause'] != 14)
Unknown = (df_ca_fire['cause'] == 14)
#Total cont of fire by Natural, Human or Unknown
natural_count = df_ca_fire.loc[Natural]['cause'].count()
human_count = df_ca_fire.loc[Human]['cause'].count()
unknown_count= df_ca_fire.loc[Unknown]['cause'].count()
#pie chart
cat=['Natural','Human','Unknown']
exp=[0,0.2,0]
plt.pie([natural_count, human_count, unknown_count],labels = cat, explode = exp,autopct = '%2.1f%%')
plt.title('Cause Type Distribution')
([<matplotlib.patches.Wedge at 0x7fa647bae730>, <matplotlib.patches.Wedge at 0x7fa647baee20>, <matplotlib.patches.Wedge at 0x7fa647bbc4f0>], [Text(1.0299680823261277, 0.38621982003703426, 'Natural'), Text(-1.163714519080492, 0.5794553633208166, 'Human'), Text(0.7498397426547276, -0.8048231857591406, 'Unknown')], [Text(0.5618007721778878, 0.21066535638383685, '11.4%'), Text(-0.7161320117418412, 0.35658791588973326, '62.5%'), Text(0.40900349599348773, -0.43899446495953115, '26.1%')])
Text(0.5, 1.0, 'Cause Type Distribution')
From the description of the causes, I categorized the cause into Human, Natural, and Unknown. As the chart shows a large portion of the fire between 1950 and 2017 are caused by humans.
# create a new column for month
df_ca_fire['alarm_month'] = df_ca_fire['alarm_date'].str[5:7]
df_ca_fire['alarm_date'] = pd.to_datetime(df_ca_fire['alarm_date'])
df_ca_fire['alarm_month'].value_counts()
df_ca_fire.info()
df_ca_fire.head()
07 3220 08 2910 06 2126 09 1998 10 957 05 830 11 389 04 265 12 192 03 120 01 117 02 79 Name: alarm_month, dtype: int64
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14805 entries, 0 to 14804 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year_ 14805 non-null int64 1 fire_name 11912 non-null object 2 alarm_date 13203 non-null datetime64[ns] 3 cause 14805 non-null int64 4 gis_acres 14804 non-null float64 5 shape_area 14805 non-null float64 6 cause_desc 14805 non-null object 7 alarm_month 13203 non-null object dtypes: datetime64[ns](1), float64(2), int64(2), object(3) memory usage: 925.4+ KB
| year_ | fire_name | alarm_date | cause | gis_acres | shape_area | cause_desc | alarm_month | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2007 | OCTOBER | 2007-10-21 | 14 | 25.736713 | 1.041528e+05 | Unknown/Unidentified | 10 |
| 1 | 2007 | MAGIC | 2007-10-22 | 14 | 2824.877197 | 1.143187e+07 | Unknown/Unidentified | 10 |
| 2 | 2007 | RANCH | 2007-10-20 | 2 | 58410.335938 | 2.363782e+08 | Equipment Use | 10 |
| 3 | 2007 | EMMA | 2007-09-11 | 14 | 172.214951 | 6.969292e+05 | Unknown/Unidentified | 09 |
| 4 | 2007 | CORRAL | 2007-11-24 | 14 | 4707.997070 | 1.905259e+07 | Unknown/Unidentified | 11 |
natural_cause= df_ca_fire.loc[Natural]
human_cause= df_ca_fire.loc[Human]
unknown_cause= df_ca_fire.loc[Unknown]
#natural fire heat map prep
overall_hm = df_ca_fire['alarm_date'].groupby([df_ca_fire['year_'],df_ca_fire['alarm_month']]).agg(['count'])
overall_hm = overall_hm.reset_index()
overall_hm.rename(columns = {'count' : 'num_of_fire', 'year_' : 'year'}, inplace = True)
#natural fire heat map prep
natural_hm = natural_cause['alarm_date'].groupby([natural_cause['year_'],natural_cause['alarm_month']]).agg(['count'])
natural_hm = natural_hm.reset_index()
natural_hm.rename(columns = {'count' : 'num_of_fire', 'year_' : 'year'}, inplace = True)
#fire cause by human heat map prep
human_hm = human_cause['alarm_date'].groupby([human_cause['year_'],human_cause['alarm_month']]).agg(['count'])
human_hm = human_hm.reset_index()
human_hm.rename(columns = {'count' : 'num_of_fire', 'year_' : 'year'}, inplace = True)
#unkonwn fire cause heat map prep
unkown_hm = unknown_cause['alarm_date'].groupby([unknown_cause['year_'],unknown_cause['alarm_month']]).agg(['count'])
unkown_hm = human_hm.reset_index()
unkown_hm.rename(columns = {'count' : 'num_of_fire', 'year_' : 'year'}, inplace = True)
overall_hm.info()
natural_hm.info()
human_hm.info()
unkown_hm.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 674 entries, 0 to 673 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 674 non-null int64 1 alarm_month 674 non-null object 2 num_of_fire 674 non-null int64 dtypes: int64(2), object(1) memory usage: 15.9+ KB <class 'pandas.core.frame.DataFrame'> RangeIndex: 284 entries, 0 to 283 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 284 non-null int64 1 alarm_month 284 non-null object 2 num_of_fire 284 non-null int64 dtypes: int64(2), object(1) memory usage: 6.8+ KB <class 'pandas.core.frame.DataFrame'> RangeIndex: 674 entries, 0 to 673 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 674 non-null int64 1 alarm_month 674 non-null object 2 num_of_fire 674 non-null int64 dtypes: int64(2), object(1) memory usage: 15.9+ KB <class 'pandas.core.frame.DataFrame'> RangeIndex: 674 entries, 0 to 673 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 674 non-null int64 1 year 674 non-null int64 2 alarm_month 674 non-null object 3 num_of_fire 674 non-null int64 dtypes: int64(3), object(1) memory usage: 21.2+ KB
# Overall heat map
value = overall_hm.pivot("alarm_month","year", "num_of_fire")
f,ax = plt.subplots(figsize=(10,7))
sns.heatmap(value, cmap="YlGnBu")
plt.title('California Heat Map by year')
<AxesSubplot:xlabel='year', ylabel='alarm_month'>
Text(0.5, 1.0, 'California Heat Map by year')
# Natural heat map
value = natural_hm.pivot("alarm_month","year", "num_of_fire")
f,ax = plt.subplots(figsize=(10,7))
sns.heatmap(value, cmap="YlGnBu")
plt.title('Natural Fire Heat Map by year')
<AxesSubplot:xlabel='year', ylabel='alarm_month'>
Text(0.5, 1.0, 'Natural Fire Heat Map by year')
# Human heat map
value = human_hm.pivot("alarm_month","year", "num_of_fire")
f,ax = plt.subplots(figsize=(10,7))
sns.heatmap(value, cmap="YlGnBu")
plt.title('Human Caused Fire Heat Map by year')
<AxesSubplot:xlabel='year', ylabel='alarm_month'>
Text(0.5, 1.0, 'Human Caused Fire Heat Map by year')
# Unkown heat map
value = unkown_hm.pivot("alarm_month","year", "num_of_fire")
f,ax = plt.subplots(figsize=(10,7))
sns.heatmap(value, cmap="YlGnBu")
plt.title('Unkown Caused Fire Heat Map by year')
<AxesSubplot:xlabel='year', ylabel='alarm_month'>
Text(0.5, 1.0, 'Unkown Caused Fire Heat Map by year')
#Filter on the top 100 fires over all
top100_fires=df_ca_fire.sort_values(by=['shape_area'], inplace=False, ascending=False)[:100]
top100_fires_cnt = top100_fires['alarm_date'].groupby(df_ca_fire['year_']).agg(['count']).reset_index()
#Bar chart showing the distribution of the top 100 fire by year
fig, ax = plt.subplots(figsize=(8, 8))
top100_fires_cnt.sort_values(by='count').plot.barh(x='year_',
y='count',
ax=ax,
color="orange")
ax.set_title("Top 100 California Fires(1950 - 2017) Count by Year")
plt.show()
<AxesSubplot:ylabel='year_'>
Text(0.5, 1.0, 'Top 100 California Fires(1950 - 2017) Count by Year')
Based on the heat maps, it shows that the fire season for each year is between May and October. Natural fire occurs consistently within the fire season. On the other hand, the fire caused by humans appears to happen more often and expands beyond the fire season over the years.
The "Top 100 California Fires(1950 - 2017) Count by Year" chart also shows larger fires are becoming more common in recent years.
The first chart shows how the population in California has grown significantly in the past years. Although natural fires are becoming more frequent, there is an exponential amount of human-caused fires that were reported. In conclusion, the data shows that there is a positive correlation between the population and the fires that are reported in California.